Mining Time Series Motifs with Probabilistic Guarantees

Matteo Ceccarello

University of Padova

Johann Gamper

University of Bolzano

Top-\(k\) motifs

Consider all pairs of (non-overlapping) subsequences of length \(w\) of a given time series. The first \(k\) by increasing distance are the top-\(k\) motifs.

State of the art: Matrix Profile

State of the art: Matrix Profile

  • For each subsequence, find the nearest neighbor
  • Find the pair at minimum distance
  • Leverage the fact that consecutive subsequences share a lot of structure
  • Parallelize with many GPUs
  • Still, it’s a \(\Omega(n^2)\) algorithm

  • Finds motifs out of 1 billion long time series in one day, using 5 GPUs

Goal

  • Find the top motifs without computing all-pairs-distances
  • We need an index

Challenges

  • Subsequences can be seen as vectors
  • These vectors are high dimensional
  • Curse of dimensionality: indices such as R-trees degrade to linear scans

To address these challenges, we will use an approach based on Locality Sensitive Hashing (LSH)

PCA-projection of some subsequences

We consider the z-normalized Euclidean distance \[ d(x, y) = \sqrt{\sum_{i\in[1,w]} \left( \frac{x_i - \bar{x}}{\sigma_x} - \frac{y_i - \bar{y}}{\sigma_y} \right)^2 } \] In this example we have a window length \(w = 640\),
hence we have vectors in \(\mathbb{R}^{640}\)

For \(r \in \mathbb{R}^+\), \(\vec{a} \sim \mathcal{N}(0,1)^w\), and \(b \sim \mathcal{U}(0,r)\) Hash function: \[ h(\vec{x}) = \left\lfloor\frac{\vec{a} \cdot \vec{x} + b}{r}\right\rfloor \]

The key point is that we only compute the distance of subsequences falling
into the same bucket.

To lower the collision probability we concatenate \(k\) hash functions \[ \hat{h}(\vec{x}) = \langle h_1(\vec{x}), \dots, h_k(\vec{x}) \rangle \] this makes for a better precision of the index.

And to increase the recall of the index we repeat \(L\) times.

Locality sensitive hashing

Correctness

  • Repeating the hashing procedure several times we will see close pairs of subsequences at least once.

  • We maintain a priority queue of the top pairs seen across repetitions.

  • By performing enough repetitions, we can ensure that it is unlikely that we missed pairs that are closer than the current top-\(k\) pairs.

  • Formally, we can ensure that we find the exact top-\(k\) motifs with probability \(1-\delta\), for a user defined \(\delta\).

  • The smaller the \(\delta\), the higher the number of repetitions.

Dealing with parameters

  • LSH features a lot of parameters
  • Correctness is ensured for each combination, but efficiency is very sensitive to parameters
  • We adopt an approach that auto-tunes parameters based on the data

  • \(L_{max}\) maximum number of repetitions,
  • \(K_{max}\) maximum number of concatenations (e.g. 4).

Repetition 1

Repetition 2

In each iteration we compute the distance of all subsequences in the same bucket.

In the first iteration, with \(k=4\), we discover a pair at distance 2
.

After 10 repetitions, we find a pair at distance 1
.

After 100 repetitions we did not find a better pair,
and the success probability is about 0.85

We then consider shorter prefixes of the hashes,
going through the 100 repetitions again.

After 15 repetitions, we observe that the
success probability is above our target, and thus return.

Complexity

Theorem 1 The LSH index construction takes time \[ O(K_{max} \cdot \sqrt{L_{max}} n\log n) \]

Theorem 2 Let \(d(m_k)\) be the distance of the \(k\)-th motif, and \(i'\le K\), \(j' \le L\) be the parameters used by the optimal LSH algorithm. Then, the algorithm considers \[ O\left( j'\sum_{m\in T^w\times T^w} p(d(m))^{i'} + (L-j')\sum_{m\in T^w\times T^w} p(d(m))^{i'+1} \right) \] pairs in expectation.

Optimizations

  • Use a trie data-structure to re-use computations across iterations at different \(k\) values
  • Compute dot producs for hash values in the frequency domain (also done in some implementations of the Matrix Profile)
  • Compute fewer hash values using tensoring

Experimental results

Find the top-10 motifs.

dataset \(n\) (millions) RC
astro 1 8.63
GAP 2 9.17
freezer 7 7.95
ECG 7 109.06
HumanY 26 581.03
Whales 308 21.66
Seismic 1000 274.44

Relative Contrast measures difficulty: higher is easier. \[ RC = \frac{d_{avg}}{d_{m_k}} \]

Difficult datasets

Data from LIGO:

  • Length 100k, window 1k
  • Top motif distance \(\approx 40\)
  • Average distance \(\approx 44\)
  • Relative contrast \(\approx 1.1\)
  • Attimo: 6 hours
  • SCAMP: \(\approx 1\) second

Scalability

Synthetic data, planted motifs with different relative contrasts. SCAMP-gpu only has one line since it is data-independent.

Practical takeaways

  • Attimo only provides information about the top motif(s), whereas the Matrix Profile provides other informtaion.

  • For shorter time series, or if the relative contrast is very small, use the Matrix Profile.

  • For time series of a few million values and above, with a not-to-small relative contrast, try Attimo

References

  • Matteo Ceccarello, Johann Gamper: Fast and Scalable Mining of Time Series Motifs with Probabilistic Guarantees. Proc. VLDB Endow. 15(13): 3841-3853 (2022)

https://github.com/Cecca/attimo



import pyattimo
ts = pyattimo.load_dataset("ecg", prefix=1000000)
motifs = pyattimo.MotifsIterator(
    ts, 
    w=1000, 
    max_k=100
)
m = next(motifs)
print(m)
motif: (723776, 796052) d=0.4335137534580184

Appendix

Influence of the maximum number of repetitions

Running for top-10 motifs, for different number of repetitions.

freezer and the 7-th motif

7M points, window 5000

  • In black are the distances of the top-10 motifs extracted from the matrix profile.
  • In red the distance of a pair of subsequences neither of which is the nearest neighbor of the other, and not overlapping with higher-ranked motifs.
  • The matrix profile holds the distances and indices of the 1-nearest neighbor of each subsequence, but top-k motifs would require the k-nearest neighbors to be maintained in the matrix profile.

Top-k and k-NN

  • Solid lines are nearest neighbor distances
  • The dashed line is the distance of the top-2 pair in the definition we are using
  • The Matrix Profile contains information about 1-nearest neighbors (solid black lines)